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Abstract — The crosstalk delay associated with global on-chip 
interconnects becomes more severe in deep submicron technology, 
and hence can greatly affect the overall system performance. 
Based on a delay model proposed by Sotiriadis et al., transition 
patterns over a bus can be classified according to their delays. 
Using this classification, crosstalk avoidance codes (CACs) have 
been proposed to alleviate the crosstalk delays by restricting the 
transition patterns on a bus. In this paper, we first propose a 
new classification of transition patterns, and then devise a new 
family of CACs based on this classification. In comparison to the 
previous classification, our classification has more classes and 
the delays of its classes do not overlap, both leading to more 
accurate control of delays. Our new family of CACs includes 
some previously proposed codes as well as new codes with 
reduced delays and improved throughput. Thus, this new family 
of crosstalk avoidance codes provides a wider variety of tradeoffs 
between bus delay and efficiency. Finally, since our analytical 
approach to the classification and CACs treats the technology- 
dependent parameters as variables, our approach can be easily 
adapted to a wide variety of technology. 

Index Terms — Crosstalk avoidance codes, delay, interconnects 



I. INTRODUCTION 

RECENT International Technology Roadmap of Semicon- 
ductors (ITRS) 1 1 1 has shown a troubling trend: while 
gate delay decreases with scaling, global wire delay increases. 
This is because with the process technologies scaling down 
into deep submicrometer (DSM), the crosstalk delay becomes 
dominant in global wire delay due to the increasing coupling 
capacitance between adjacent wires. Hence, the crosstalk delay 
has become a serious bottleneck of the overall system perfor- 
mance. 

The analytical model proposed by Sotiriadis et al. fl2], Q, 
a widely used delay model, gives upper bounds on the delay 
of all wires on a bus. According to J2), |(3 1, the delay of the 
fc-th wire (k £ {1, 2, • • • , m}) of an m-bit bus is given by 



T, 




AAiA 2 ], 

- AA fe (A fe _ 

- A A™ A in- 



k = l 

Afe+i)], fc^l,m 
k = m, 
(1) 

where A is the ratio of the coupling capacitance between ad- 
jacent wires and the ground capacitance, To is the propagation 
delay of a wire free of crosstalk, and A^ is 1 for — >• 1 
transition, -1 for 1 — >• transition, or for no transition on 
the fc-th wire. In this model, the delay of the fc-th wire depends 
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on the transition patterns of at most three wires, k — 1, fc, and 
fc+1 only. The transition patterns over these three wires can be 
classified based on Eq. (fTJ into five classes, denoted by Di for 
i = 0, 1, 2, 3, 4, and the patterns in Di have a worst-case delay 
(1 + iX)TQ. This classification enables one to limit the worst- 
case delay over a bus by restricting the patterns transmitted on 
the bus. That is, by avoiding all transition patterns in Di for 
i > io, one can achieve a worst-case delay of (1 + io\)ro over 
the bus. Based on this principle, crosstalk avoidance codes 
(CACs) of different worst-case delays have been proposed 
(see, for example, 0-||6]]). For example, forbidden overlap 
codes (FOCs), forbidden transition codes (FTCs), forbidden 
pattern codes (FPCs), and one lambda codes (OLCs) achieve 
a worst-case delay of (1 + 3A)to, (1 + 2A)to, (1 4- 2A)ro, 
and (1 + A)to, respectively. Based on Eq. (fl}, a worst-case 
delay of To can be achieved by assigning two protection 
wires to each data wire 0. Other types of CACs, such as 
those with equalization [7| or two-dimensional CACs [8 |, have 
been proposed in the literature. For CACs, since the area and 
power consumption of their encoder/decoder (CODECs) are 
all overheads, the complexities of the CODECs are important 
to the effectiveness of CACs. Thus, efficient CODECs have 
been proposed for CACs Il9l— IfTTI. 

The classification of transition patterns based on the model 
in 0, has two drawbacks. First, the model in 0, 
has limited accuracy because of its dependence on only three 
wires: the model overestimates the delays of patterns in DI 
through DA, while it underestimates the delays of patterns 
in DO. For this reason, the scheme with a worst-case delay 
of To in is invalid since its actual delay is much greater. 
Second, the actual delay ranges in some classes overlap with 
others. This, plus the overestimation of delays for DI through 
DA, implies that the delays of existing CACs are not tightly 
controlled. These drawbacks motivate us to include more wires 
and to classify the ttansition patterns without overlapping 
delay ranges. 

In |fl~2), we have proposed a new analytical five- wire delay 
model. Two extra neighboring wires are included in the 
delay model lfl2l . and the delay of the middle wire of five 
neighboring wires is determined by the transition patterns on 
all five wires. This five-wire model has better accuracy than 
the model in 0, for Di for i = 0, 1, 2, 3, 4 (121. This work 
confirms that using more wires leads to improved accuracy. 

There are two main contributions in this paper: 

• First, we approximate the crosstalk delay in a five-wire 
model and propose a new classification of transition 
patterns. 

• Second, we propose a family of CACs based on our 



classification. 

The work in this paper is different from previous works, 
including our previous works, in several aspects: 

• First, although the delay approximation in this paper is 
also based on a five-wire model, it is different from that 
in our previous work fl2l . The delay approximation in 
this paper is carried out by extending the approach in 
Ifl3l from a three-wire model to a five-wire one. 

• Second, our classification of transition patters is different 
from that in (2), J3] (based on Eq. ((TJ), in two aspects. 
First, our classification has seven classes as opposed to 
five based on Eq. (Q]). Second, while the delays of some 
classes overlap for the classification based on Eq. (UJ, all 
classes in our classification have non-overlapping delays. 
These two key differences allow us to have a more 
accurate control of delays for transition patterns. 

• Our new family of CACs is also different from previ- 
ously proposed CACs, all of which are based on the 
classification in (2), (3) (based on Eq. (Q])). While some 
codes in this new family are shown to be the same as 
existing CACs, OLCs, FPCs, and FOCs, this family also 
includes new codes that achieve smaller worst-case delays 
and improved throughputs than OLCs, which have the 
smallest worst-case delays among all existing CACs. 

The rest of the paper is organized as follows. In Section HH 
we first propose our classification and compare it with that 
in 12) > (3). We then present our new family of CACs in 
Section|lII]and compare their performance with existing CACs 
in Section [TV] Some concluding remarks are provided in 
Section [V] 

II. INTERCONNECT DELAYS AND 
CLASSIFICATION 

A. Interconnect Modeling 

Since the functionality and performance in DSM technology 
are greatly affected by the parasitics, distributed RC models 
are widely employed to analyze on-chip interconnects. In this 
paper, we consider the distributed RC model of five wires 
shown in Fig. [TJ where Vi[x, t) denotes the transient signal 
at time t and position x (0 < x < L) over wire i for 
i € {1,2,3,4,5}, r and c denote the resistance and ground 
capacitance per unit length, respectively. Also, Ac denotes the 
coupling capacitance per unit length between two adjacent 
wires. The value of A depends on many factors, such as the 
metal layer in which we route the bus, the wire width, the 
spacing between adjacent wires, and the distance to the ground 
layer. We consider a uniformly distributed bus with the same 
parameters r, c, and A for all the wires. 

B. Derivation of Closed-form Expressions 

When determining the delay of a wire, the model in J2), (3) 
considers only the effects of either one or two neighboring 
wires (cf. Eq. ([TJ). To address the drawbacks of the model 
in (21, [3] described above, additional neighboring wires need 
to be accounted for. In our delay derivation below, whenever 
possible we consider four neighboring wires of a wire, two 
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Fig. 1. A distributed RC model for five wires. 

neighboring wires on each side, to determine its delay. To 
approximate the delay of a side wire (wires 1, 2, n— 1 or n) of 
an n-wire bus, three neighboring wires are considered. This 
is because the side wires are affected by fewer neighboring 
wires. This scheme is similar to the model in (2), and 
appears to work well. We focus on the 50% delay, which is 
defined as the time required for the unit step response to reach 
50% of its final value. 

In (T3l . the crosstalk of two coupled lines was described 
by partial differential equations (PDEs), and a technique 
for decoupling these highly coupled PDEs was introduced 
by using eigenvalues and corresponding eigenvectors. In our 
work, we extend this approach from a three-wire model to a 
five-wire one. Specifically, we first use the technique in |[T3l to 
decouple the PDEs that describe the crosstalk of four coupled 
wires, then solve these independent PDEs for closed-form 
expressions, and finally approximate the delays of each wire. 

The PDEs characterizing five wires with length L are given 
by: 2 

|^V(z,i) = Rc|v(z,t), (2) 

where R = diagjr r r r r}, "V(x, t) = 
[Vi(x,t) V 2 {x,t) V 3 {x,t) V 4 {x,t) V 5 {x,t)] T , and 
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The eigenvalues of C/c are given by p\ = 1, p 2 = 1 

P3 = 1 + A, p 4 - 1 -"- 



1 + 2^ A, p 4 = 1 + i± 7^ J2 A, and p 5 = 1 + 
3 ~y^ A. Their corresponding eigenvectors e»'s are given by 
ex = [1111 1] T , e 2 = [2^=1 
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1] T , and e 5 = [-1 - ^4p± ^1 1] T , respectively. 
With a technique for decoupling partial differential equa- 
tions similar to [13], Eq. (|2]i is transformed into 

d 2 d 

— U l (x,t) = rcp l — U l (x,t), for % = 1,2,3,4,5, (3) 

where Ui(x,t) = ~V T (x,t)ei denotes the transformed signals. 
The decoupled PDEs in Eq. (TJ) are independent of each 
other. Each Ui(x,t) describes a single wire with a modified 
capacitance cpi. The solution to Ui(L, t) is given by a series of 
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the form Ui{L, t) = Vdd + J2T=o rfce SfcT ■ ^ s sr, own in |[T3l , 
a single-exponent approximation + r o e ~) is enough 

for t/r > 0.1, where r Q and s are the coefficients of the most 
significant term. 

For different transitions, we solve Eq. (f3j) for Ui(x,t) and 
obtain V 3 (L,t) = \[U x {L,t) + 2U 2 {L,t) + 2U 3 {L,t)}, which 

is given by a sum of a constant and three exponent terms, 

t t t 

— cqc a » T — C\e a i T — c 2 e ). Then the 50% delay 

of wire 3 can be evaluated by solving V 3 (L,t) = 0.5Vdd- 

For side wires, PDEs characterizing four wires with length 

L are given by: 
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RC-V(x,t), (4) 

where R = diag{r r r r}, ~V(x, t) = 
[Vi(x,t) V 2 (x,t) V 3 (x,t) V 4 (x,t)] T , and C = 

~1+A -AO 
— A 1+2A -A 
-A 1+2A -A 
-A 1+A_ 

The eigenvalues of C/c are given by p\ = 1, p 2 = 1 + 
(2 - V2)A, p 3 = 1 + 2A, and p 4 = 1 + (2 + V2)X. Then- 
corresponding eigenvectors e/s are given by ei = [1 1 1 1] T , 
e 2 = [-1 (1 - s/2) - (1 - y/2) 1] T , e 3 = [1 - 1 - 1 1] T , 
and e 4 = [-1 (1 + y/2) - (1 + y/2) 1] T , respectively. 

By decoupling the PDEs in Eq. (0J, we have 



d 2 



d 

rcpi—Ui(x,t), for i = 1,2,3,4, (5) 
at 

The expressions of wires 1 and 2 are given by V\ (L, t) = 
lUi(L,t) ~ 2 ±^ 2 U 2 (L,t) + \U 3 {L,t) - ^£/ 4 (L,t) and 
V 2 (L,t) = lU 1 (L,t)-^U 2 (L,t)-lU 3 (L,t) + ^U 4 (L,t), 
respectively. Then the 50% delays of wires 1 and 2 can be 
evaluated by solving Vi(L, t) — 0.5Vdd for i = 1,2. 

C. Pattern Classification 

First, we consider the classification of transition patterns 
over five wires with respect to the delay of the middle wire 
(wire 3). In this paper, we use "f" to denote a transition 
from to the supply voltage Vdd (normalized to 1), "-" no 
transition, and "J," a transition from Vdd to 0. We first focus 
on patterns with a j" transition on wire 3 in a five-wire bus 
and derive V 3 (L, t) for each pattern as described in Sec. III-BI 
There are 3 4 = 81 different transition patterns, which can be 
partitioned into 25 subclasses according to the expressions of 
the output signals on wire 3: All transition patterns in each 
subclass have the same expression V 3 {L,t). The expressions 
of all 25 subclasses are shown in Tab. Q] Then the expressions 
V 3 (L, t) of all patterns in the 25 subclasses are evaluated for 
their 50% delays. By grouping subclasses with close delays 
into one class, we can divide the 81 transition patterns into 
seven classes Ci for i = 0, 1, • • ■ ,6 shown in Tab. U For all 
25 subclasses, simulated delays are also provided in Tab. U 
For all seven classes, the difference between evaluated delay 
and simulated delay in Tab. [T] is small. 

All evaluations and simulations are based on a freePDK 
45 nm CMOS technology with 10 metal layers (14]. We assume 
that the top two metal layers, layers 9 and 10, are used for 
routing global interconnects, and that metal layer 8 is used as 




Fig. 2. Delays of the middle wire for all patterns with respect to A in a 
five-wire bus (to = 1.42ps). 



the ground layer. An interconnect model in lfT5l is used for 
parasitic extraction. For a 5mm bus in the top metal layer, the 
key parasitics, resistance, ground capacitance, and coupling 
capacitance, are given by R = 68.75S1, C gn d = 41.32/F, and 
Ccoupie = 505.68/F, respectively. The bus is modeled by a 
distributed RC model as shown in Fig. [T] with 100 segments. 
The two important parameters used in our delay approximation 
are r = 0.5RC gnd = 1.42ps and A = C coup i e /C gnd , = 12.24. 
Since the crosstalk delay on the bus constitutes a major part of 
the whole delay, the delays introduced by buffers are ignored. 
We assume that ideal step signals are applied on the bus 
directly. The closed-form expressions are evaluated for 50% 
delays via MATLAB and the simulation is done by HSPICE. 

From Tab. [j] it can be easily verified that C5 and C6 are the 
same as D3 and L>4 in [2|, [3|, respectively. That is, the middle 
three wires of the transition patterns in C5 (C6, respectively) 
constitute D3 {DA, respectively). The transition patterns in 
DO, Dl, and D2 are divided into five classes CO — C4 in our 
classification with following relations, C4 C D2, C3 C Dl U 
D2, C2 C DOUDl, CI C DOUDIUTO, and CO C L>0UL>1. 

Note that the coefficients q for i = 0, 1, 2 of the expression 
of wire 3 are independent of technology and determined by 
different patterns. For a given pattern, the coefficients a are 
fixed and the delay is a function of to and A. Since the ratio 
t/ro appears in the exponent term, varying tq would scale 
delays in all classes. Thus, the classification does not depend 
on To. The coupling factor A could affect the delay differently. 
In the following, we verify our classification for technology 
with different coupling factor, A = 1, 2, • • ■ ,13, and show the 
results in Fig. [2] Different classes are denoted by different line 
styles. Each class contains multiple lines, which represents a 
subclass. Patterns in each subclass have the same delay. For 
A > 3, the ranges of delays in all classes do not overlap. 
Also, the delay in each subclass increases linearly with A. 
This implies that our classification is valid provided that the 
coupling factor A is at least 3. 
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TABLE I 

Closed-form expressions for the output signals on wire 3 in a five-wire bus with evaluated and simulated 50% delays 
(t = 1.42 ps, t = -%tq, A = 12.24, ao = 1, ai = 1 + 5 ~ 2 %/g A, and a 2 = 1 + 5+ 2 v ^ A for all classes). 
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than that of wire 1, which can be verified from Tabs. UT1 and ITITT 
In this case, we focus on the delay of wire 2. When only wire 1 
has transition, we focus on the delay of wire 1 . The difference 
between evaluated delay and simulated delay is small as shown 
in Tabs. and |nl] with one exception (the pattern ffj-t m 1C 
in Tab. [n), which doesn't change our classification. 

From Tabs. HI] and Hill the classes 3C and AC of our 
classification are exactly the same as D3 and DA in 0, O, 
respectively. The class 1C and 2C of our classification are 
subsets of Dl and D2 in J3, 0, respectively. The class 0C 
is a subset of DO U Dl in 0, 0. 

Similar to the classification of middle wires, we conclude 
that the classification on side wires does not depend on To. To 
verify our classification for technology with different coupling 
effects, we consider coupling factor A = 1,2, ••• ,13, and 
show the results in Fig. [3] Each class contains multiple lines, 
each of which represents a pattern in Tabs. HI] and [Till For 
A > 1, the ranges of delays in all classes do not overlap. 
Also, the delay in each subclass increases linearly with A. This 



Then, we consider the classification of transition patterns 
over four wires with respect to the delays of the side wires. 
We classify patterns by considering the worst-case delays of 
wires 1 and 2, respectively. Note that the classification with 
respect to the delays of wires 4 and 5 would be the same 
by symmetry. We first focus on patterns with a 1 transition 
on wire 2 in a four- wire bus. There are 3 3 = 27 different 
transition patterns. As described in Sec. IH-BI we first derive 
the expressions V<2.{L, t) of these 27 patterns shown in Tab. [TT] 
By evaluating these patterns for their 50% delays, we group 
patterns with close delays into one class, and form 5 classes 
jC for j = 0, 1, 2, 3, 4 as shown in Tab. HI] Then, we focus 
on patterns with a 1 transition on wire 1. There are 3 3 = 27 
different transition patterns. As described in Sec. IH-BI we first 
derive the expressions Vi(L,t) of these 27 patterns shown in 
Tab. HID By evaluating these patterns for their 50% delays, we 
group patterns with close delays into one class, and form 3 
classes jC for j = 0, 1,2 as shown in Tab. [TTTj When both 
wires 1 and 2 have transitions, the delay on wire 2 is larger 
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TABLE II 

Closed-form expressions for the output signals on wire 2 in a four- wire bus with evaluated and simulated 50% delays 
(t = 1.42 ps, r = \to, A = 12.24, a = 1, ai = 1 + (2 - \/2)A, a 2 = 1 + 2A, AND a 3 = 1 + (2 + V2)X FOR ALL CLASSES). 
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implies that our classification on side wires is valid provided 
that the coupling factor A is at least 1. 

In addition to being a finer classification, the new classi- 
fication has no overlapping delays among different classes. 
Fig.|4]compares the simulated delays of different classes based 
on the classification in J2, and our new classification. In 
Fig. 2J the grey bars identify the minimum and maximum 
simulated delays in every class. Note that only two extremes 
are important, and not all delay values in the grey bars are 
achievable by some transition patterns. In Fig. Ufa), the thick 
line segments denote the upper bounds for delay of each class 
based on Eq. (Q]). The upper bounds by the model in (2, 
overestimate the delays of Dl through DA and underestimate 
the delay of DO. As shown in Fig. Ufa), the actual delays 
in DO, Dl, and D2 overlap with each other. Some patterns 
with smaller delays have potential to transmit information 
at a higher speed, but are categorized into a class with a 
larger delay bound. Thus, the classification by the model 
in (2), does not result in effective crosstalk avoidance 
codes. In contrast, the delays of different classes in our new 
classification do not overlap as shown in Fig. Hfb), 4(c), and 
4(d). By classifying patterns this way, we have a more accurate 
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Fig. 3. Delays of side wires for all patterns with respect to A in a four-wire 
bus (t = 1.42ps). 
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TABLE III 

Closed-form expressions for the output signals on wire 1 in a four-wire bus with evaluated and simulated 50% delays 
(t = 1.42 ps, r = \to, A = 12.24, ao = 1, ai = 1 + (2 - \/2)A, a 2 = 1 + 2A, AND a 3 = 1 + (2 + s/2)\ FOR ALL CLASSES). 
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control of delays for transition patterns. 

III. NEW MEMORYLESS CROSSTALK AVOIDANCE 
CODES 

A. Previous CAC Design 

CACs reduce the crosstalk delay for on-chip global intercon- 
nects by encoding a /c-bit data word {x\x% ■ ■ ■ Xk) into an n-bit 
(n > k) codeword {c\ci ■ ■ ■ c n ). Two kinds of CACs, CACs 
with memory and memoryless CACs, have been investigated in 
the literature. CACs with memory, as shown in Fig. [5ta), need 
to store all codebooks corresponding to different codewords 
(C1C2 • • • c„), since the encoding depends on the data word 
(X1X2 ■ ■ ■ Xk) as well as the preceding codeword. In contrast, 
memoryless CACs, as shown in Fig. |3b), require a single 
codebook to generate codewords for transmission, because the 
encoding depends on the data word only. Hence, memoryless 
CACs are simpler to implement than CACs with memory. We 
focus on memoryless CACs in this paper. 

The codebook of a memoryless CAC satisfies the property 
that each codeword must be able to transition to every other 
codeword in the codebook with a delay less than the require- 
ment. Most memoryless CACs in the literature are based on 



the model in O, O. The key idea is to eliminate undesirable 
patterns for transmission. Existing memoryless CACs include 
OLCs, FPCs, FTCs, and FOCs H-Jl, ED, which achieve a 
worst-case delay of (1 + A)r , (1 + 2A)r , (1 + 2A)r , and 
(1 + 3A)to, respectively. As mentioned above, the scheme that 
was proposed to achieve a worst-case delay of tq is invalid 
since the model in 13, underestimates the delays for 0C. 
Thus, OLCs achieve the smallest worst-case delay (1 + X)tq 
among existing CACs. 

There exist several methods to obtain a memoryless code- 
book based on pattern pruning, transition pruning, or recursive 
construction. The pattern pruning technique is quite straight 
forward, and gives a codebook with a smaller worst-case delay 
by eliminating some patterns. For example, FOCs cannot have 
both 010 and 101 patterns around any bit position, and FPCs 
are free of 010 and 101 patterns [16]. The transition pruning 
technique |6] is based on graph theory. This method first builds 
a transition graph with all possible codewords as nodes and all 
valid transitions as edges, and then finds a maximum clique. 
A clique is defined as a subgraph where every pair of nodes 
are connected with an edge. A maximum clique is defined as a 
clique of the largest possible size in a given graph. Since every 
pair of nodes is connected, a maximum clique in this graph 
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Fig. 4. Simulated delays of different classes of transition patterns using (a) 
Classification based on (T}; (b) Classification with respect to the delay of the 
middle wire in a five-wire bus; (c) Classification with respect to the delay of 
wire 2 in a four- wire bus; (d) Classification with respect to the delay of wire 
1 in a four-wire bus (A = 12.24 and tq = 1.42ps). 



constitutes a memoryless codebook with the largest size. The 
codebook generation method is based on exhaustive search. 
Although it is easy to get a maximum clique from a transition 
graph with a small n, the complexity increases rapidly with 
n. This is because the number of edges in an n-bit transition 
graph is upper bounded by 2™~ 1 (2" — 1), which increases 
exponentially with n. In fact, it is an NP problem to find 
a maximum clique for given constraints [17|. The recursive 
technique constructs an (n + l)-bit codebook from an n-bit 
codebook J4], (5). Since for a small n, a largest codebook can 
be obtained easily via the second method, a codebook for an 
n-wire bus can be constructed recursively. 

B. CAC Design with New Classification 

Since our classification of patterns is different from that in 
121, the CAC designs should be reconsidered with our new 
classification. In the following, we first introduce a recursive 



method for codebook construction under different constraints, 
and then derive the size of codebooks. 

In our work, we use the recursive method to obtain a 
memoryless codebook for the following two reasons. First, 
it is complex to apply the pattern pruning technique, since our 
new classification is based on transitions over five wires, and 
it is not clear which patterns have larger worst-case delays 
and should be removed. Second, it is hard to find a maximum 
clique for a transition graph with a large n. In our method, 
we first start with a 5-bit codebook, obtained by searching 
for maximum cliques in a five-wire bus, and then build an 
(n + l)-bit codebook by appending '0' and '1' to codewords 
of an n-bit codebook while satisfying delay constraints. 

Our new classifications partition patterns over five adjacent 
wires into seven classes, CO to C6, and patterns over four 
adjacent wires into five classes, OC to 4C. Similar to the CAC 
design based on the model in 0, J3|, the new classifications 
are conducive to the design of CACs by eliminating undesir- 
able transition patterns with large worst-case delays. 

To get valid 5-bit codebooks, we first assume the allowed 
patterns are from CO to Ci for i — 0,1, •••,6 in our 
classification for middle wires. Then, for the side wires, we 
assume patterns are from OC to jC based on the classification 
for side wires. Under these two assumptions, there are many 
configurations of constraints, which are referred as (Ci,jC), 
where i £ {0, 1, — ,6} and j G {0, 1, • • • , 4}. 

Since the worst-case delay of a bus is determined by the 
largest delays among all wires, for an n-bit (n > 5) bus under 
(Ci,jC) we require that the worst-case delays on middle 
wires and side wires are close enough. By our classifications, 
we find 0C is close to CO, 1C close to C2 and C3, 2C 
close to C4, 3C close to C5, and 4C close to C6. Hence, 
among all configurations of constraints (Ci,jC), we only 
focus on (C0,0C), (C2, 1C), (C3, 1C), (C4,2C), (C5,3C), 
and (C6,4C). When n < 4, the constraint Ci cannot be 
enforced. Hence, the constraint (Ci,jC) reduces to jC. The 
constraint (CO, 0C) appears to be too restrictive, and hence 
we do not investigate it in this paper. The last configuration 
(C6, 4C) is trivial, since it allows arbitrary transitions. 

In the following, we propose a scheme for finding an n-bit 
codebook C(cijc)( n )- F° r simplicity, we denote C(Ci,jC)(n-) 
as C(n) when there is no ambiguity about the constraint. 
First, for a five-wire bus under constraint (Ci,jC), a pattern 
transition graph is obtained. We search the graph for the largest 
5-bit codebooks. One or two 5-bit codebooks of maximum 
sizes exist for each constraint in Tab. IIVI where we denote 
an n-bit binary codeword (c\C2 ■ ■ ■ c n ) as a decimal number 
J2?=i c i2™~ 4 for simplicity. In |6|, a bit boundary in a set of 
codewords is said to be 01-type if only codewords with 00, 01, 
and 1 1 are allowed across that boundary, and a bit boundary is 
said to be 10-type when only codewords with 00, 10, and 11 
are allowed across that boundary. It is shown that the largest 
clique for a given constraint has alternating boundary types. 
Thus, there are two largest cliques. Similarly, from Tab. [IV] 
we conjecture that the largest codebooks have alternating 
constraints, Cg and C5, for every five consecutive wires. For 
constraint (C4, 2C), only one maximum 5-bit codebook exists. 
We assume Cg is the same as C" for constraint (C4, 2C). 
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Fig. 5. System model for (a) CACs with memory; (b) Memoryless CACs. 



Since we have two types of constraints, two largest codebooks 
for each constraint can be obtained, except for (C4, 2C), where 
the two codebooks are the same. Then we apply Alg. Q] to 
obtain C(n). In the initialization, we pick a 5-bit codebook 
C5 = C5. Then, the algorithm recursively appends one bit 
to the codewords in the codebook in each iteration. For 
Cfe = (cic 2 ■ ■ -Cfe), the appended bit x needs to satisfy that 
the last five bits (ck-%Ck-2Ck-iCkX) form a codeword in Cf, 
which alternates between C5 and C\. If we pick the other 
5 -bit codebook C5 = Cg, we would obtain another codebook. 

Algorithm 1 Codebook design under (Ci,jC) 

Input: C 5 °, C 5 \ n; 

Initialize: k = 5, C 5 = Cg, s = 1; 
while k < n — 1 do 

for Vc fc = (cic 2 • • • c k ) e C(k) do 
if (cfc_ 3 c fc _2Cfc-ic fe 0) e Cf then 

append to and add the new codeword to C(k + 

1); 

else if (c fc _ 3 Cfc_2Cfc-iCfel) € Cf then 

append 1 to and add the new codeword to C(k + 

1); 

end if 
end for 

s = 1 — s; 

k = k + l; 
end while 
Output: C(n). 



The recursive construction allows us to derive the size of 
the codebooks. Let V/ci,jC) be an all-one m-dimensional 
row vector (m = |Cg|) under constraint (Ci,jC). Let 
cf. be a fc-bit codeword with last five consecutive bits 
(c fc _4C fc _3C fe _ 2 c fc _iC fc ) € C| for s = or 1. If a or 
1 can be appended to c| to form a (fc + l)-bit codeword 
whose last five bits (ck-3Ck-2Ck-iCkCk+i) € C^^, such an 
expansion is called a valid expansion. Otherwise, it is called 



an invalid expansion. An expansion matrix is denoted as a 
to x to matrix T)f Ci ^ c y where D^ Ci j C \(i, j) = denotes 
an invalid expansion and D^ Ci ^ c ~,(i,j) = 1 a valid expansion 
from the i-th codeword in Cf to the j-th codeword in Cl~" 
under constraint (Ci,jC). Each row of Di^ ^ has at most 
two ones, since each fc-bit codeword can be appended to 
form at most two (k + l)-bit codewords whose last five bits 
satisfy the appropriate constraints. Let Y be an to x m anti- 
diagonal matrix with all ones. Due to symmetry between C® 
and Cl, D° and D 1 satisfy T>) n . = YD° .^Y. Define 

5' J (Ci,jC) (Ct,jC) 

D (ClJ c) = T> iCi jC) Y = YD| CijC) . We denote V { a,jC) 
and D(ci,jC) as » an d D, respectively, when there is no 
ambiguity about the constraint. Then, for n > 5, the number 
of codewords in an n-bit bus is equal to counting the valid 
transitions and is given by 

\C(n)\ =VD°D 1 ---V T 




V(D°YYD 1 ) I M 1 V T if n is odd; 

V(D°YYD 1 ) Ii ^D YYV T if n is even; 



= VD™~ 5 YV T . 

(6) 

In the following, we first focus on constraints (C3, 1C), 
(C4, 2C), and (C5, 3C). The codes based on these constraints 
are shown to have the same codebooks as OLCs, FPCs, and 
FOCs, respectively. Then, we consider constraint (C2, 1C), 
which would lead to codes with a smaller delay at the expense 
of a lower code rate. 

C. Codes Under (C3, 1C) 

The one Lambda codes have a worst-case delay (1 + A)r. 
According to |16|, the worst-case delay (1 + A)r can only 
be achieved if and only if the transitions fj, x, -f-, and f- 
f plus their symmetric and complement versions (e.g. fj, x 
and x If are symmetric, and is the complement of - 
f-) are avoided, where f> 4-> x > an d " denote 0— >-l, 1— >0, 
don't care, and no transition, respectively. The first constraint 
of avoiding ti x ensures that a transition between any two 
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TABLE IV 

Largest 5-bit codebook(s) under constraint (Ci, jC). 
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TABLE V 

Expansion matrix for (C3, IC), (C4, 2C), and (C"5, 3C). 
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codewords does not cause opposite transition on any wire. This 
condition is referred as a forbidden-transition (FT) condition. 
The second constraint of avoiding -j- ensures that 2C patterns 
are removed. This constraint ensures two adjacent bit bound- 
aries cannot both be 01-type or 10-type, and is referred as a 
forbidden adjacent boundary pattern (FABP) condition |16|. 
The last two forbidden patterns give the constraint that no 
patterns 010 and 101 appear in the codeword, which is referred 
as a forbidden-pattern (FP) condition lfT6l . Codes satisfying 
these necessary and sufficient conditions are called one 
Lambda codes (OLCs). We denote the largest OLC codebook 
size for an n-bit bus as G n , and G„ is given by 

G n = G n -i + G„_ 5 (7) 

with initial conditions G\ = 2, G2 = 3, G3 = 4, G4 = 5, and 

g 5 = 7 ma. 

With our classification, we explore codes under constraint 
(C3, IC). From Tab. |IV] the two largest 5-bit codebooks are 
given by Cg={°> 3, 14, 15, 24, 30, 31} and Cg={0, 1, 7, 16, 
17, 28, 31}. An n-bit codebook C(n) can be obtained via 
Alg. Q] The number of codewords is given by 

|C(n)| = VD^- 3 5 ia) V T for n > 5, (8) 

where V is a seven-dimensional all one vector and T)(c3,ic) 
is a 7 x 7 expansion matrix as shown in Tab. [V] We further 
establish that the largest codebook sizes under constraint 
(C3, IC) satisfy the recursion: 

Lemma III.L For n > 8, \C(c3,ic){ n )\ is given by a recur- 
sion |C(C3,1C)(™)| = |C(C3,1C)(" - 2)| + \C {C 3,ic)(n - 3)|, 
with initial conditions \C(C3,ic)( n )\ = 7> 9, 12, for n =5, 6, 
7, respectively. 



See the appendix for the proof. In fact, we can further relate 
these codes with OLCs by the following: 

Theorem III.l. The codes under fC3, IC) have the same 
codebooks as OLCs. Hence, G n — \C(C3,iC)i n )\- 

See the appendix for the proof. Theorem IIII.ll implies that 
the codes under constraint (C3, IC) are equivalent to the class 
of OLC codes. 

D. Codes Under (C4, 2C) 

The (1 + 2A) codes have a worst-case delay of (1 + 2A)r. 
No necessary and sufficient condition is known for a code to 
be a (1 + 2A) code. Two sufficient conditions FT and FP are 
found, which lead to two families of (1 + 2A) codes, FTC and 
FPC, respectively. The size of an FTC codebook for an n-wire 
bus is given by F n+ 2, where F n is the Fibonacci sequence 
that satisfies F n+ 2 = F n+ i + F n and has initial conditions 
Fi = F2 = 1 |6|. The FPCs for an n-wire bus have a larger 
codebook size 2F n+ i J4). 

With our classification, we explore codes under constraint 
(C4, 2C). From Tab. IIVI only one largest 5-bit codebook is 
found C!?={0, 1, 3, 6, 7, 12, 14, 15, 16, 17, 19, 24, 25, 28, 
30, 31}. An n-bit codebook C(n) can be obtained via Alg. [TJ 
by setting C| = C". The number of codewords is given by 

\C{n)\ = VD^ 4 5 2C) V T for n > 5 (9) 

where V is a 16-dimensional all one vector and D( C4 2 c) i s a 
16 x 16 expansion matrix as shown in Tab. [V] We further 
establish that the largest codebook sizes under constraint 
(C4, 2C) satisfy the recursion: 

Lemma III.2. For n > 9, \Crci,2C)( n )\ can be simpli- 
fied as recursion \C( C4y2 c)( n )\ = ^\C(a,2C){n - 1)1 ~ 
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|C(C4,2C)( n ~ 2)| + \C( C A.2C){ n ~ 4)|, with boundary con- 
[n)\ =16, 26, 42, 68, for n =5, 6, 7, 8, 



ditions \C, 



(Ci,2C) 

respectively. 

See the appendix for the proof. Again, we can relate these 
codes to existing CACs by the following: 

Theorem III.2. The codes under (CA,2C) have the same 
codebooks as FPCs. Hence, 2F n+ i = \C(c'4.2C) i n )\- 

See the appendix for the proof. Since FPCs and our codes 
under (C4, 2C) can be obtained by excluding D3 plus DA 
patterns and C5 plus C6 patterns, respectively, Theorem 1111. 21 
is not surprising given that C5 and C6 are the same as D3 
and DA, respectively. Theorem IIII.2I implies that results in the 
literature regarding FPCs are also applicable to codes under 
constraint (C4, 2C). 

E. Codes Under (C5, 3C) 

The (1 + 3A) codes have a worst-case delay of (1 + 3A)r, 
which can be achieved if and only if J/f-l and fit are avoided. 
So the necessary and sufficient condition for the (1 + 3 A) 
codes is that the codebook cannot have both 010 and 101 
appearing centered around any bit position, which is referred 
as a forbidden-overlap (FO) condition. Codes satisfying the FO 
condition are called FOCs. It is shown that the largest FOC 
codebook for an n-bit bus is given by T„ + 2, where T n = 
T n -i + T n -2 + ?n_3 is the tribonacci number sequence with 
initial conditions Ti = 1. To = 1. and Tn = 2 liT6l. 

With our classification, we explore codes under constraint 
(C*5,3C*). Two largest 5-bit codebooks C*£={0, 1, 2, 3, 6, 7, 
8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 24, 25, 26, 27, 28, 
30, 31} and C^={0, 1, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17, 
19, 20, 21, 22, 23, 24, 25, 28, 29, 30, 31} are found. Via 
Alg. Q] an n-bit codebook C(n) can be obtained. The number 
of codewords is given by 

\C(n)\ = VD"- 5 5 3C) V T for n > 5, (10) 

where V is a 24-dimensional all one vector and H(C5.3C) i s 
a 24 x 24 expansion matrix as shown in Tab. [V] 

We further establish that the largest codebook sizes under 
constraint (C5,3C) satisfy the recursion: 

Lemma III.3. For n > 8, \C^c5.3C){ n )\ can be simpli- 
fied as recursion |C(C5,3C) Wl = \C(C5,3C)(n - 1)| - 
\C(C5,3C)(n - 2)| + |C(c5,3C)(ra - 3)|, with boundary con- 
ditions | C(C5,3C) ( n ) I =24,44,81, for n =5, 6, 7, respectively. 

See the appendix for the proof. Again we can relate these 
codes to existing CACs by the following: 

Theorem III.3. The codes under fC5, 3CJ have the same 
codebooks as FOCs. Hence, T n+ 2 = \C(cb,3C){ n )\- 

See the appendix for the proof. Theorem 1111.31 is not 
surprising, since FOCs and our codes under (C5, 3C) can be 
obtained by excluding DA and C6 patterns, respectively, and 
DA and C6 have been shown to be the same. Theorem 1111.31 
implies that results in the literature regarding FOCs are also 
applicable to codes under constraint (C5, 3C). 



F. Codes Under (C2, 1C) 

With our classification, we explore codes under constraint 
(C2, 1C). From Tab. IIVI the two largest 5-bit codebooks are 
given by C*g={00000, 00011, 01111, 11000, 11110, 11111} 
and QHOOOOO, 00001, 00111, 10000, 11100, 11111}. An 
n-bit codebook C(n) can be obtained via Alg. Q] The number 
of codewords is given by 



\C(n)\ = VT> n - 5 V T for n > 5, 



(11) 



where V is a six-dimensional all one vector and D = 

■o 1 1 

10 
10 
10 

10 

1 0. 

We further establish that the largest codebook sizes under 
constraint (C2, 1C) satisfy the recursion: 

Lemma III.4. For n > 10, \C(C3,ic)( n )\ can be simplified as 

recursion |C (C 2,ic) HI = \C(C2,iC)(n- 2)| + \C( C 2,iC)(n- 
5)|, with initial conditions \C(c2.ic){ n )\ —6, 7, 9, 11, 14, for 
n =5, 6, 7, 8, 9, respectively. 

See the appendix for the proof. 

Lemma III.5. The codebook under (C2, 1C) is a subset of 
OLC. 

See the appendix for the proof. 

G. Pruned Codes Under (C2, 1C) 

For (C2, 1C), the restriction on the side wires is more 
relaxed than that on the middle wires, which results in larger 
worst-case delays for the side wires. Hence, we prune the 
CACs under constraint (C2, 1C) by removing codewords with 
larger delays on the side wires in order to achieve a smaller 
worst-case delay. Since the pruned codes have a smaller delay 
than OLCs, we call these pruned CACs improved one Lambda 
codes (IOLCs). We obtain IOLCs by first finding an n-bit 
codebook via Alg. Q] as in Sec. IIII-FI and then pruning the 
codebook with Alg. [2] To prune the codebook C(n), we search 
for maximum subsets of C\ (i = 0, 1) with smaller delays 
on the side wires. For C5, two maximum subsets Cg'°={0, 
3, 15, 30, 31} and ^={0, 15, 24, 30, 31} are found with 
smaller worst-case delays on wires 1 and 2 and wires 4 and 
5, respectively. For C5, a maximum subset C* 5 ' ={0, 1, 7, 
16, 31} is found with smaller worst-case delays on wires 4 
and 5. Finally, a valid n-bit codebook is obtained with the 
leftmost five bits belonging to C5' , and the rightmost five 
bits belonging to C5' 1 or Cg' 1 depending on whether n is odd 
or even. 

The pruning algorithm for CACs under (C2, 1C) on an n- 
bit bus is shown in Alg. |2] By pruning all codewords c n in 
C(n), the algorithm removes codewords with larger delay on 
side wires. With Alg. [2] we get an n-bit IOLC under constraint 
(C2, 1C), and its size is given by 



\CioLc(n)\ = WiD"- 5 YW'^ for n > 5, 



(12) 



where Wj. = [1 1 1 1 1], W 2 = [1 1 1 1 1], and D is 
the same as that in Eq. (fTTT i. Note that Wi and W2 are used 
instead of V, because of the pruning of valid patterns on side 
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Algorithm 2 Pruning CACs under (£72, 1C) 

Input: C 5 °'°, Cl'\ (7 5 14 , C(n); 
if n is odd then 

i = 1; 
else 

i = 0; 
end if 

for Vc„ = (cic 2 • • ■ c„) G C(n) do 

if (C1C2C3C4C5) g £7°'° or (c„_4C„_3C„_2C Tl -ic„) ^ 

C^ 4 ' 1 then 

eliminate c„ from C(n); 

end if 
end for 

Output: C(n). 



We further establish that the largest codebook sizes of 
IOLCs satisfy the recursion: 

Lemma III.6. For n > 10, \CioLc( n )\ can be simplified as 
recursion \CjoLc{n)\ = \Ci Lc(n - 2)| + \C IO Lc{n - 5)|, 
with initial conditions \CioLc( n )\ =4, 5, 7, 8, 11, for n =5, 
6, 7, 8, 9, respectively. 

This recursion is the same as that in that in Lemma IIII.4I 
It can be proved in the same fashion as for Lemma Hll.41 and 
hence its proof is omitted. 

Lemma III.7. The IOLC codebook is a subset of OLC. 
See the appendix for the proof. 

TABLE VI 

Simulated delays of our IOLC, unpruned (C2, 1C) code, and 

OLC (5) FOR A 10-BIT BUS (A = 12.24 AND T = 1.42PS). 



Wire i 


Delays (ps) 


IOLCs 


(C'2, IV) 


OLCs 


1 


10.08 


5.49 


10.55 


2 


7.03 


9.13 


2.92 


3 


9.31 


9.31 


5.94 


4 


9.31 


9.45 


6.09 


5 


9.59 


9.36 


10.73 


6 


9.41 


9.41 


13.64 


7 


10.14 


10.14 


14.06 


8 


9.65 


10.57 


14.84 


9 


8.97 


9.14 


8.99 


10 


5.28 


13.50 


14.84 



IV. Performance Evaluation 

In this section, we evaluate the performance of CACs based 
on our classification with extensive simulations, and compare 
them with existing CACs. Each CAC has two key performance 
metrics: delay and rate. The delay of a CAC is the worst- 
case delay when the codewords from the CAC are transmitted 
over the bus. Codebook size and code rate are often used to 
measure the overhead of CACs. The codebook size of a CAC 
is simply the number of codewords. Suppose a CAC of size 
M is transmitted over an n-bit bus, then its rate is defined 
as L los 2 M \ _ a CAC of rate k/n implies that n — k extra 
wires are used in addition to k data wires so as to reduce the 
crosstalk delay. Hence, the code rate measures the area and 



TABLE VII 

Simulated delays of our IOLC, unpruned (C2, 1C) code, and 

OLC [5] FOR A 16-BIT BUS (A = 12.24 AND TO = 1.42PS). 



Wire i 


Delays (ps) 


IOLCs 


(C2, 1C) 


OLCs 


1 


10.32 


13.92 


15.95 


2 


7.43 


9.51 


10.03 


3 


9.57 


10.88 


15.54 


4 


9.83 


10.21 


15.75 


5 


10.16 


10.16 


15.02 


6 


10.33 


10.34 


15.57 


7 


10.39 


10.39 


15.70 


8 


10.23 


10.23 


15.48 


9 


9.87 


10.25 


15.57 


10 


10.40 


10.39 


15.66 


11 


10.34 


10.33 


15.52 


12 


10.17 


10.21 


14.88 


13 


10.25 


10.39 


15.85 


14 


9.98 


10.92 


15.59 


15 


9.61 


9.62 


10.13 


16 


5.58 


13.92 


16.11 



power overhead of CACs: the higher the rate, the smaller the 
overhead. Obviously, there is a tradeoff between the code rate 
and delay of a CAC: typically a lower rate code is needed 
to achieve a smaller delay. To measure the overall effects of 
both rate and delay, we also define the throughput of a CAC 
as the ratio of code rate and delay. The assumptions for this 
definition are: (1) the clock rate of the bus is determined by 
the inverse of the worst-case delay; (2) the throughput of the 
bus is linearly proportional to k, the number of data wires. 

Since codes under (£73, 1C), (£74, 2(7), and (£75, 3(7) have 
exactly the same codebooks as OLCs, FPCs, and FOCs, their 
delay, rate, and throughput are also the same. Under constraint 
(£72, 1£7), we propose two kinds of codes, unpruned codes and 
pruned codes (IOLCs). In the following, we compare their 
performance with OLCs in [5| with extensive simulations. 

To compare the worst-case delay of our IOLCs, unpruned 
(£72, 1£7) codes, and OLCs, we simulate two buses, a 10- 
bit bus and a 16-bit bus, with all transitions between any 
two codewords in their codebooks and obtain the worst-case 
delays of each wire. The simulation environment has been 
explained in Sec. IH-CI Both buses have a length of 5mm, and 
To = 1.42ps and A = 12.24. The simulation results are shown 
in Tabs. [Vl] and I VIII where for each CAC the largest delays 
among all wires are in boldface. As commented above for 
unpruned (£72, 1£7) codes, the delays of the two outmost wires 
are significantly greater than those of other wires. For a 10-bit 
bus, the worst-case delays of our IOLC, unpruned (£72, 1£7) 
code, and an OLC are given by 10.14ps, 13.50ps, and 14.84ps, 
respectively. The worst-case delay of our IOLC and unpruned 
(£72, 1£7) code are 31.67% and 9.03% smaller than that of 
the OLC, respectively. For a 16-bit bus, the worst-case delays 
of our IOLC, unpruned (£72, 1£7) code, and an OLC are given 
by 10.40ps, 13.92ps, and 16. lips, respectively. The worst-case 
delay of our IOLC and unpruned (£72, 1(7) code are 35.44% 
and 13.59% smaller than that of the OLC, respectively. 

For all simulations, our IOLCs have better delay per- 
formance than OLCs. Although both IOLCs and unpruned 
(£72, 1£7) codes have almost the same code rate and better 
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TABLE Vin 

Comparison of codebook size and throughput of IOLC, unpruned (C2, 1C) code, and OLC (A = 12.24 and t = 1.42ps). 



# of 
wires 


IOLC 


(C2, 1C) 


OLC 


$ of words 


H nf Kite 
Tr Ol UltS 


1 rirougfiput (j3in 


$ of words 


fr Ul OILS 


1 frroughput Gsin 


$ of words 


# Ol U1LS 


5 


4 


2 


1.55 


6 


2 


1.10 


7 


2 




5 


2 


1 07 


7 


2 


78 


Q 


3 


7 


7 


2 


1.02 


9 


3 


1.14 


12 


3 


8 


8 


3 


1.12 


11 


3 


0.84 


16 


4 


9 


11 


3 


1. 10 


14 


3 


0.84 


21 


4 


10 


12 


3 


1.10 


17 


4 


1.10 


28 


4 


11 


16 


4 


1.18 


21 


4 


0.88 


37 


5 


12 


18 


4 


1.19 


26 


4 


0.89 


49 


5 


13 


23 


4 


1.03 


32 


5 


0.96 


65 


6 


14 


27 


4 


1.02 


40 


5 


0.95 


86 


6 


15 


34 


5 


1.27 


49 


5 


0.95 


114 


6 


16 


41 


5 


1.11 


61 


5 


0.83 


151 


7 



delay performance than OLCs, the delay performance of 
IOLCs is much better than the unpruned (C2, 1C) codes. 
With a more advanced technology where the coupling effect 
is significant, the improvement of our IOLCs is bigger. 

The comparisons of the codebook size between our IOLCs, 
unpruned (C2, 1C) codes, and OLCs [5| and the throughput 
gain with respect to OLCs are shown in Tab. IVIIII The 
throughput gain of our CACs with respect to OLCs is given 
by the ratio between the throughput of our CACs and the 
throughput of OLCs. The codebook sizes of the three codes 
are close. In all cases, the difference of the number of bits 
between our IOLCs and unpruned (C2, 1C) codes is within 1 
bit. The difference of the number of bits between our IOLCs 
and OLCs [5| is within 2 bits for n < 16. In respect to 
throughput, our IOLCs always have a greater throughput than 
OLCs, and their throughput gain ranges from 1.02 to 1.55 
for an rt-wire bus (5 < n < 16). The unpruned (C2, 1C) 
codes have better throughput in some cases than OLCs, and 
the throughput gain ranges from 0.78 to 1.10 for an n-wire 
bus (5 < n < 16). When unpruned (C2, 1C) codes have a 
lower throughput than OLCs, IOLCs can be used. 

Our IOLCs and unpruned (C2, 1C) codes provide additional 
options for the tradeoff between code rate and code delay. In 
addition to achieving higher throughputs, the new CACs are 
also appropriate for interconnects where the delay is of top 
priority. 

It has been shown that the encoding and decoding of OLCs, 
FPCs, and FOCs have quadratic complexity based on numeral 
systems (TT]. Since codes under (C3, 1C), (C4, 2C), and 
(C5,3C) have exactly the same codebooks as OLCs, FPCs, 
and FOCs, their CODECs also have quadratic complexity. 
Also, it is expected that the encoding and decoding of our 
IOLCs and unpruned (C2, 1C) codes have a quadratic com- 
plexity, since the codebooks of our IOLCs and unpruned 
(C2, 1C) codes are proper subsets of OLCs. 

We remark that the simulation results in Sections IH-CI and 
ITVl are all based on a 45nm CMOS technology. We have also 
run the same set of simulations based on a 0.1 -/im technology 
(omitted for brevity). Between the two sets of simulation 
results, the main conclusions of the manuscript and the key 
features of our proposed classification and CACs remain the 
same. For instance, the delays of the patterns in different 
classes do not overlap, regardless of the technology. Also, the 



proposed CACs based on the new classification are also the 
same. This actually demonstrates that our approach to delay 
classification and CACs is applicable to a wide variety of 
technology. This is because in our approach, the dependency of 
the crosstalk delay on the technology is represented by the two 
parameters, the propagation delay To of a wire free of crosstalk 
and the coupling factor A. Since our analytical approach to 
the classification and CACs treats these two parameters as 
variables, our approach can be easily adapted to a wide variety 
of technology. 

V. CONCLUSIONS 

In this paper, we propose a new classification of transition 
patterns. The new classification has finer classes and the 
delays do not overlap among different classes. Hence the new 
classification is conducive to the design of CACs. To illustrate 
this, we design a family of CACs with different constraints. 
Some codes of the family are the same as existing codes, 
OLCs, FPCs, and FOCs. We also propose two new CACs with 
a smaller worst-case delay and better throughput than OLCs. 
Since our analytical approach to the classification and CACs 
treats the technology-dependent parameters as variables, our 
approach can be easily adapted to a wide variety of technology. 

Appendix 

Proof of Lemma \III.H The eigenvalues of D are given 
by solving det |AI — D| = 0. Then, 

dot |AI - D| = 

=*> A 7 - A 5 - A 4 = 
=>■ D 7 = D 5 D 4 

VD 7 V T = VD 5 V T + VD 4 V T 

=> |C(n)|=|C(n-2)| + |C(n-3)|. 

For n = 5,6,7, the boundary conditions can be obtained 
by Eq. ® as |£7(5)| = 7, |C(6)| = 9, and |C*(7)| = 12. Thus, 
the lemma holds for n > 8. ■ 
Proof of Theorem \lII.l\ It has been shown that an (n+1)- 
bit OLC codebook C(n + 1) can be constructed from an n-bit 
codebook C(n) 0. The necessary and sufficient condition 
for OLCs defines the same expansion matrix as our codes. 
The OLC construction is the same as that of our codes under 
(C3, 1C) shown in Alg. Q] For n = 5, the OLC codebooks 



13 



are the same as our codes under (C3, 1C). So, for an n-bit 
bus (n > 5), codes under constraint (C3, 1C) are the same 
as OLCs. For an n-bit bus (n < 4), the constraint (C3, 1C) 
reduces to 1C, and leads to the same codebooks as OLCs. 
Hence, our codes under (C3, 1C) have the same codebooks 
as OLCs, which implies that G n = \C(n)\. 

■ 

Proof of Lemma \III.2\ The eigenvalues of D are given 
by solving det |AI — D| =0. Then, 

det|AI-D| = 
=> D 16 = 2D 15 — D 14 + D 12 
=> VD 16 V T = 2VD 15 V T - VD 14 V T + VD 12 V T 

\C(n)\ = 2\C(n - 1)| - |C(n - 2)| + |C(n - 4)|. 

For n = 5,6, 7, 8, the boundary conditions can be obtained 
by Eq. © as |C(5)| = 16, |C(6)| = 26, \C(7)\ = 42, and 
|C(8)| = 68. Thus, the lemma holds for n > 9. ■ 

Proof of Theorem \lII.2\ It has been shown that an (n+1)- 
bit FPC codebook C(n + 1) can be constructed from an n-bit 
codebook C(n) H. The sufficient condition (FP condition) 
for FPCs defines the same expansion matrix as our codes. 
The FPC construction is the same as that of our codes under 
(C4, 2C) shown in Alg. G] For n = 5, the FPC codebooks 
are the same as our codes under (C4, 2C). So, for an n-bit 
bus (n > 5), codes under constraint (C4, 2C) are the same 
as FPCs. For an n-bit bus (n < 4), the constraint (C4, 2C) 
reduces to 2C, and leads to the same codebooks as FPCs. 
Hence, our codes under (C4, 2C) have the same codebooks 
as FPCs, which implies that 2F n+ i = \C(n)\. 

■ 

Proof of Lemma \III.3\ The eigenvalues of D are given 
by solving det |AI — D| =0. Then, 

det|AI-D| =0 
=► D 24 = D 23 + D 22 + D 21 
=► VD 24 V T = VD 23 V T + VD 22 V T + VD 21 V T 

\C(n)\ = \C(n - 1)| + \C(n - 2)| + \C(n - 3)|. 

For n = 5,6, 7, 8, the boundary conditions can be obtained 
by Eq. ^ as |C(5)| = 24, |C(6)| = 44, and \C{7)\ = 81. 
Thus, the lemma holds for n > 9. ■ 

Proof of Theorem \lII.3\ It has been shown that an (n+1)- 
bit FOC codebook C(n + 1) can be constructed from an n-bit 
codebook C(n) |4). The necessary and sufficient condition 
(FO condition) for FOCs defines the same expansion matrix 
as our codes. The FOC construction is the same as that of our 
codes under (C5, 3C) shown in Alg. Q] For n = 5, the FOC 
codebooks are the same as our codes under (C5,3C). So, for 
an n-bit bus (n > 5), codes under constraint (C5,3C) are 
the same as FOCs. For an n-bit bus (n < 4), the constraint 
(C5, 3C) reduces to 3C, and leads to the same codebooks 
as FOCs. Hence, our codes under (C5,3C) have the same 
codebooks as FOCs, which implies that T n+ 2 — \C(n)\. ■ 

Proof of Lemma \III.4\ The eigenvalues of D are given 
by solving det |AI — D| =0. Then, 

det |AI - D| = 
D 6 = D 4 D 
=> VD 6 V T = VD 4 V T + VDV T 

=► \C{n)\ = |C(n-2)| + |C(n-5)|. 



For n = 5,6,7,8,9, the boundary conditions can be 
obtained by Eq. Cu} as |C(5)| = 6, |C(6)| = 7, \C{7)\ = 9, 
|C(8)| = 11, and |C(9)| = 14. Thus, the lemma holds for 
n > 10. ■ 

Proof of Lemma \III.5\ As shown in Tab. [IV] C\ under 
(C2, IC) is a subset of C| under (C3, 1C) for i = Q, 1. Thus, 
the valid expansions from C| to C\~ l under (C2, 1C) is part 
of that under (C3, 1C). So, for an n-bit bus, C(C2,ic)( n ) ^ 
C(C3,iC)( n )- According to Thm. Mill the n-bit codebook 
C(c2.ic)( n ) is a subset of an OLC codebook. ■ 

Proof of Lemma MIL 71 Since the IOLC codebook is a 
subset of the unpruned codes under (C2, 1C), this follows 
Lemma HOI ■ 
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